Performance Impacts of Non-blocking Caches in Out-of-order Processors
نویسندگان
چکیده
Performance Impacts of Non-blocking Caches in Out-of-order Processors Sheng Li; Ke Chen; Jay B. Brockman; Norman P. Jouppi HP Laboratories HPL-2011-65 Non-blocking cache; MSHR; Out-of-order Processors Non-blocking caches are an effective technique for tolerating cache-miss latency. They can reduce miss-induced processor stalls by buffering the misses and continuing to serve other independent access requests. Previous research on the complexity and performance of non-blocking caches supporting non-blocking loads showed they could achieve significant performance gains in comparison to blocking caches. However, those experiments were performed with benchmarks that are now over a decade old. Furthermore the processor that was simulated was a single-issue processor with unlimited run-ahead capability, a perfect branch predictor, fixed 16-cycle memory latency, single-cycle latency for floating point operations, and write-through and write-no-allocate caches. These assumptions are very different from today's high performance out-of-order processors such as the Intel Nehalem. Thus, it is time to re-evaluate the performance impact of non-blocking caches on practical out-of-order processors using up-to-date benchmarks. In this study, we evaluate the impacts of non-blocking data caches using the latest SPECCPU2006 benchmark suite on practical high performance out-of-order (OOO) processors. Simulations show that a data cache that supports hit-under-2-misses can provide a 17.76% performance gain for a typical high performance OOO processor running the SPECCPU 2006 benchmarks in comparison to a similar machine with a blocking cache. External Posting Date: July 06, 2011 [Fulltext] Approved for External Publication Internal Posting Date: July 06, 2011 [Fulltext] Copyright 2011 Hewlett-Packard Development Company, L.P. 1 Performance Impacts of Non-blocking Caches in Out-of-order Processors Sheng Li, Ke Chen, Jay B. Brockman, Norman P. Jouppi Hewlett-Packard Labs, University of Notre Dame † {sheng.li4, norm.jouppi}@hp.com, ‡ {kchen2, jbb}@nd.edu
منابع مشابه
NCOR: An FPGA-Friendly Nonblocking Data Cache for Soft Processors with Runahead Execution
Soft processors often use data caches to reduce the gap between processor and main memory speeds. To achieve high efficiency, simple, blocking caches are used. Such caches are not appropriate for processor designs such as Runahead and out-of-order execution that require nonblocking caches to tolerate main memory latencies. Instead, these processors use non-blocking caches to extract memory leve...
متن کاملAligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors
The performance of statically scheduled VLIW processors is highly sensitive to the instruction scheduling performed by the compiler. In this work we identify a major deficiency in existing instruction scheduling for VLIW processors. Unlike most dynamically scheduled processors, a VLIW processor with no load-use hardware interlocks will completely stall upon a cache-miss of any of the operations...
متن کاملPractical Precise Evaluation of Cache Effects on Low Level Embedded Vliw Computing
The introduction of caches inside high performance processors provides technical ways to reduce the memory gap by tolerating longmemory access delays. While such intermediate fast caches accelerate program execution in general, they have a negative impact on the predictability of program performances. This lack of performance stability is a non-desirable characteristic for embedded computing. W...
متن کاملSimplifying Hardware for Out Of Order Execution using the Decoupling Paradigm
Future hardware and software technology will try to provide improved performance by extracting higher levels of parallelism. However the cost of a main memory access-in terms of missed instruction issue slots-increases with faster processors and greater issue widths. For this reason latency hiding technology remains one of the most important parts of high performance processor designs. In this ...
متن کاملEffects of Main Memory Latencies on the Performance of Nonblocking Caches
Lockup-free caches in conjunction to non-blocking processor loads have been proposed to hide miss latencies in high performance processors. One problem with current approaches is the increased complexity of the processor and of the cache controller due to non-blocking. In this paper, we introduce a simple mechanism to support non-blocking loads and a lockup-free cache. A modified SPARC architec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011